Overview of variants

After checking the most prevalent & abundant (appeared in more than 50 populations and average allele frequency > 0.2) variants, we also analyzed the variants detected in each compound evolved populations. This is to not miss the variants that are specific to compound (therefore not prevalent) and have varied allel frequency (therefore average <= 0.2 among all detected populations).

Skip list according to gene wise variant results

We filtered out variants around the following gene (as they were appearing in many compounds and discussed in gene wise variant analysis): peg.962, detected in ALE 2.0, discussed in gene wise analysis; peg.1114 (in fact closer to peg.1113 lipoprotein); peg.1351; peg.1554 (detected in parental and control samples with AF == 1); peg.1416, peg.1417; peg.1537; peg.1554, and peg.1555.

Variants detected in ALE 2.0

We extracted variants in ALE 2.0 and compared them to the PFNA and PFOA in ALE 1.0 (no filter for AF, i.e. all variants with AF > 0.05; or gene).

Variants detected in ALE 2.0 - zoom in peg.319

Gene peg.319 is annotated as beta-glucosidase (one of the 17 copies in the genome; EC 3.2.1.21;Ontology_term=KEGG_ENZYME:3.2.1.21; Glycosyl hydrolase family 3 C-terminal domain protein OS=Bacteroides uniformis in UniProt).

Variants detected in ALE 2.0 - zoom in peg.962

This has been discussed in gene wise analysis (1200664 - 1203843, strand -).

Variants detected in ALE 2.0 - zoom in 1,912 Kb - 1,945 Kb

We then extracted and plot the hot sport region, from 1,912 to 1,945 Kb. Five copies of SusC-SusD found in this region. Gene IDs in this region range from peg.1536 to peg.1563.

As we already analyzed peg.1537(actually suppose to be peg.1536), peg.1554 (detected in parental and control samples with AF == 1), we will filter these two gene out here.

As we can see, among five copies of TonB, the 2nd (peg.1544) and 5th (peg.1562) have more variants detected. To zoom in those two genes:

Variants detected in ALE 2.0 - zoom in peg. 1962

This gene (peg.1962, 2,402,232 to 2,403,446, + strand) is annotated as site-specific recombinase, belonging to thephage integrase family. Variants in non-coding region (at 2,403,658) seem to be located after this gene and before peg.1963 (Capsular polysaccharide transcription antitermination protein UpxY family, or NusG or KOW domain-containing protein; 2,403,943 to 2,404,491). NusG is an intrinsic transcription termination factor that stimulates motility and coordinates gene expression with NusA (peg.1227).

Variants detected in ALE 2.0 - zoom in peg.2727

This has been discussed in gene wise analysis (3255197 to 3256387, strand -, hypothetical protein).

Variants detected in ALE 2.0 - region after 3.5M

There are few mutations with AF > 0.5 after 3,500 Kb, however, non of those has replicates. Among which, we have plotted peg.3135 and peg.3342, will be skip here.

Variants detected for compound Loperamide (following Figure 1a)

Firstly, we checked the overall distribution of variants in compound and control populations.

Among those variants, we decided to focus on the areas where AF > 0.2 variants were detected, including coding area peg.1417, peg.1562, peg.1703, peg.1704, peg.1976, peg.2523, peg.2890. There are two non-coding region variants with AF ~ 0.18 around gene peg.393 (Two-component system sensor histidine kinase) and peg.538 (Outer membrane TonB-dependent transporter utilization system for glycans and polysaccharides (PUL) SusC family), however, no replicates were found in those region.

Loperamide - zoom in peg.1562

Not very interesting for Loperamide, as the AF in parental and control strains is even higher than in Loperamide evolved populations. This gene has been discussed in ALE 2.0 before, as the 5th TonB in 1,1912 Kb - 1, 945 Kb.

Loperamide - zoom in peg.1703/1704

TrKAH units for K+ channel / transporter, might be related to the growth adaptation. Study showed that trkA mutation will depolarize the membrane comparing to wild type (Zhang et al., 2020). Since these two genes are located on - strand, therefore, most of the frame shift mutations were at the beginning of the protein coding sequences.

Loperamide - zoom in peg.1976

Variant (at position 2,420,681) locates after gene peg.1976 (Glyco_trans_1_4,Glyco_transf_4), but found in many samples -> probably not doing anything?

Loperamide - zoom in peg.2523

Synonymous variant in coding region of gene GDP-mannose 4,6-dehydratase.

Loperamide - zoom in peg.2890

Missense variant in coding region of gene putative type IIS restriction/modification enzyme.

Variants detected for compound Ezetimibe and Simvastatin (following Figure 1a)

Firstly, we checked the overall distribution of variants in compound and control populations.

Among those variants, we decided to focus on the areas where AF > 0.2 variants were detected, including coding area peg.1417, peg.1562, peg.1976. All those genes were discussed either in gene wise variant analysis or other compound wise variant analysis.

Variants detected for compound Xanthan gum (appeared several times in gene wise variant plots)

Firstly, we checked the overall distribution of variants in compound and control populations.

Among those variants, we decided to focus on the areas where AF > 0.2 variants were detected, including coding area peg.935, peg.1250, peg.1416, peg.1417, peg.1562, peg.2424, peg.2427, peg.2855.

Xanthan gum - zoom in peg.935

Gene peg.935 (RHS repeat-associated core domain protein) is on - strand.

Xanthan gum - zoom in peg.2423 and 2427

Genes peg.2423 (YjbH, -), peg.2424 (-), peg.2427 (YjbH, +, tr|R7EIH8|R7EIH8_9BACE, lipoprotein).

Xanthan gum - zoom in peg.2855

Gene peg.2855 (Biotin carboxylase of acetyl-CoA carboxylase (EC 6.3.4.14);Ontology_term=KEGG_ENZYME:6.3.4.14) is on + strand.

Variants with AF > 0.1 [[how to decide the cutoff??]]

We also get variants with AF > 0.1 and compared them between compounds -> PCoA of these variants? Rows are populations, columns are variants?